Breast and Lung Anticancer Peptides Classification Using N-Grams and Ensemble Learning Techniques

نویسندگان

چکیده

Anticancer peptides (ACPs) are short protein sequences; they perform functions like some hormones and enzymes inside the body. The role of any or peptide is related to its structure sequence amino acids that make up it. There 20 types in humans, each them has a particular characteristic according chemical structure. Current machine deep learning models have been used classify ACPs problems. However, these neglected Amino Acid Repeats (AARs) play an essential function peptides. Therefore, this paper, offer promising route for novel anticancer by extracting AARs based on N-Grams k-mers using two peptides’ datasets. These datasets pointed breast lung cancer cells assembled curated manually from Cancer Peptide Protein Database (CancerPPD). Every dataset consists their synthesis activity cell lines. Five different feature selection methods were paper improve classification performance reduce experimental costs. After that, classified four classifiers, namely AdaBoost, Random Forest Tree (RFT), Multi-class Support Vector Machine (SVM), Multi-Layer Perceptron (MLP). classifiers evaluated applying five well-known evaluation metrics. Experimental results showed process provided accurate reached 89.25% 92.56%, respectively. In terms AUC, it 95.35% 96.92% both ACPs, proposed performed competently somewhat equally accuracy, precision, F-measures, recall, except SVM-based selection, which superior performance. As result, significantly improved predictive can effectively distinguish as virtual inactive, moderately active, very active.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Protein classification using modified n-grams and skip-grams.

Motivation Classification by supervised machine learning greatly facilitates the annotation of protein characteristics from their primary sequence. However, the feature generation step in this process requires detailed knowledge of attributes used to classify the proteins. Lack of this knowledge risks the selection of irrelevant features, resulting in a faulty model. In this study, we introduce...

متن کامل

ADABOOST ENSEMBLE ALGORITHMS FOR BREAST CANCER CLASSIFICATION

With an advance in technologies, different tumor features have been collected for Breast Cancer (BC) diagnosis, processing of dealing with large data set suffers some challenges which include high storage capacity and time require for accessing and processing. The objective of this paper is to classify BC based on the extracted tumor features. To extract useful information and diagnose the tumo...

متن کامل

the clustering and classification data mining techniques in insurance fraud detection:the case of iranian car insurance

با توجه به گسترش روز افزون تقلب در حوزه بیمه به خصوص در بخش بیمه اتومبیل و تبعات منفی آن برای شرکت های بیمه، به کارگیری روش های مناسب و کارآمد به منظور شناسایی و کشف تقلب در این حوزه امری ضروری است. درک الگوی موجود در داده های مربوط به مطالبات گزارش شده گذشته می تواند در کشف واقعی یا غیرواقعی بودن ادعای خسارت، مفید باشد. یکی از متداول ترین و پرکاربردترین راه های کشف الگوی داده ها استفاده از ر...

Ensemble-based Author Identification Using Character N-grams

This paper deals with the problem of identifying the most likely author of a text. Several thousands of character n-grams, rather than lexical or syntactic information, are used to represent the style of a text. Thus, the author identification task can be viewed as a single-label multiclass classification problem of high dimensional feature space and sparse data. In order to cope with such prop...

متن کامل

the relationship between using language learning strategies, learners’ optimism, educational status, duration of learning and demotivation

with the growth of more humanistic approaches towards teaching foreign languages, more emphasis has been put on learners’ feelings, emotions and individual differences. one of the issues in teaching and learning english as a foreign language is demotivation. the purpose of this study was to investigate the relationship between the components of language learning strategies, optimism, duration o...

15 صفحه اول

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Big data and cognitive computing

سال: 2022

ISSN: ['2504-2289']

DOI: https://doi.org/10.3390/bdcc6020040